120 research outputs found

    Bundle-Level Type Methods Uniformly Optimal for Smooth and Nonsmooth Convex Optimization

    Full text link
    The main goal of this paper is to develop uniformly optimal first-order methods for convex programming (CP). By uniform optimality we mean that the first-order methods themselves do not require the input of any problem parameters, but can still achieve the best possible iteration complexity bounds. By incorporating a multi-step acceleration scheme into the well-known bundle-level method, we develop an accelerated bundle-level (ABL) method, and show that it can achieve the optimal complexity for solving a general class of black-box CP problems without requiring the input of any smoothness information, such as, whether the problem is smooth, nonsmooth or weakly smooth, as well as the specific values of Lipschitz constant and smoothness level. We then develop a more practical, restricted memory version of this method, namely the accelerated prox-level (APL) method. We investigate the generalization of the APL method for solving certain composite CP problems and an important class of saddle-point problems recently studied by Nesterov [Mathematical Programming, 103 (2005), pp 127-152]. We present promising numerical results for these new bundle-level methods applied to solve certain classes of semidefinite programming (SDP) and stochastic programming (SP) problems.Comment: A combination of the previous two papers submitted to Mathematical Programming, i.e., "Bundle-type methods uniformly optimal for smooth and nonsmooth convex optimization" (December 2010) and "Level methods uniformly optimal for composite and structured nonsmooth convex optimization (April 2011

    Gradient Sliding for Composite Optimization

    Full text link
    We consider in this paper a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term. We present a new class of first-order methods, namely the gradient sliding algorithms, which can skip the computation of the gradient for the smooth component from time to time. As a consequence, these algorithms require only O(1/ϵ){\cal O}(1/\sqrt{\epsilon}) gradient evaluations for the smooth component in order to find an ϵ\epsilon-solution for the composite problem, while still maintaining the optimal O(1/ϵ2){\cal O}(1/\epsilon^2) bound on the total number of subgradient evaluations for the nonsmooth component. We then present a stochastic counterpart for these algorithms and establish similar complexity bounds for solving an important class of stochastic composite optimization problems. Moreover, if the smooth component in the composite function is strongly convex, the developed gradient sliding algorithms can significantly reduce the number of graduate and subgradient evaluations for the smooth and nonsmooth component to O(log(1/ϵ)){\cal O} (\log (1/\epsilon)) and O(1/ϵ){\cal O}(1/\epsilon), respectively. Finally, we generalize these algorithms to the case when the smooth component is replaced by a nonsmooth one possessing a certain bi-linear saddle point structure

    Robust affine control of linear stochastic systems

    Full text link
    In this work we provide a computationally tractable procedure for designing affine control policies, applied to constrained, discrete-time, partially observable, linear systems subject to set bounded disturbances, stochastic noise and potentially Markovian switching over a finite horizon. We investigate the situation when performance specifications are expressed via averaged quadratic inequalities on the random state-control trajectory. Our methodology also applies to steering the density of the state-control trajectory under set bounded uncertainty. Our developments are based on expanding the notion of affine policies that are functions of the so-called "purified outputs", to the class of Markov jump linear systems. This re-parametrization of the set of policies, induces a bi-affine structure in the state and control variables that can further be exploited via robust optimization techniques, with the approximate inhomogeneous SS-lemma being the cornerstone. Tractability is understood in the sense that for each type of performance specification considered, an explicit convex program for selecting the parameters specifying the control policy is provided. Our contributions to the existing literature on the subject of robust constrained control lies in the fact that we are addressing a wider class of systems than the ones already studied, by including Markovian switching, and the consideration of quadratic inequalities rather than just linear ones. Our work expands on the previous investigations on finite horizon covariance control by addressing the robustness issue and the possibility that the full state may not be available, therefore enabling the steering of the state-control trajectory density in the presence of disturbances under partial observation.Comment: 36 pages, 2 figure

    Random gradient extrapolation for distributed and stochastic optimization

    Full text link
    In this paper, we consider a class of finite-sum convex optimization problems defined over a distributed multiagent network with mm agents connected to a central server. In particular, the objective function consists of the average of mm (1\ge 1) smooth components associated with each network agent together with a strongly convex term. Our major contribution is to develop a new randomized incremental gradient algorithm, namely random gradient extrapolation method (RGEM), which does not require any exact gradient evaluation even for the initial point, but can achieve the optimal O(log(1/ϵ)){\cal O}(\log(1/\epsilon)) complexity bound in terms of the total number of gradient evaluations of component functions to solve the finite-sum problems. Furthermore, we demonstrate that for stochastic finite-sum optimization problems, RGEM maintains the optimal O(1/ϵ){\cal O}(1/\epsilon) complexity (up to a certain logarithmic factor) in terms of the number of stochastic gradient computations, but attains an O(log(1/ϵ)){\cal O}(\log(1/\epsilon)) complexity in terms of communication rounds (each round involves only one agent). It is worth noting that the former bound is independent of the number of agents mm, while the latter one only linearly depends on mm or even m\sqrt m for ill-conditioned problems. To the best of our knowledge, this is the first time that these complexity bounds have been obtained for distributed and stochastic optimization problems. Moreover, our algorithms were developed based on a novel dual perspective of Nesterov's accelerated gradient method

    Asynchronous decentralized accelerated stochastic gradient descent

    Full text link
    In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks. We establish O(1/ϵ)\mathcal{O}(1/\epsilon) (resp., O(1/ϵ)\mathcal{O}(1/\sqrt{\epsilon})) communication complexity and O(1/ϵ2)\mathcal{O}(1/\epsilon^2) (resp., O(1/ϵ)\mathcal{O}(1/\epsilon)) sampling complexity for solving general convex (resp., strongly convex) problems

    Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming

    Full text link
    In this paper, we generalize the well-known Nesterov's accelerated gradient (AG) method, originally designed for convex smooth optimization, to solve nonconvex and possibly stochastic optimization problems. We demonstrate that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general nonconvex smooth optimization problems by using first-order information, similarly to the gradient descent method. We then consider an important class of composite optimization problems and show that the AG method can solve them uniformly, i.e., by using the same aggressive stepsize policy as in the convex case, even if the problem turns out to be nonconvex. We demonstrate that the AG method exhibits an optimal rate of convergence if the composite problem is convex, and improves the best known rate of convergence if the problem is nonconvex. Based on the AG method, we also present new nonconvex stochastic approximation methods and show that they can improve a few existing rates of convergence for nonconvex stochastic optimization. To the best of our knowledge, this is the first time that the convergence of the AG method has been established for solving nonconvex nonlinear programming in the literature

    Algorithms for stochastic optimization with functional or expectation constraints

    Full text link
    This paper considers the problem of minimizing an expectation function over a closed convex set, coupled with a {\color{black} functional or expectation} constraint on either decision variables or problem parameters. We first present a new stochastic approximation (SA) type algorithm, namely the cooperative SA (CSA), to handle problems with the constraint on devision variables. We show that this algorithm exhibits the optimal O(1/ϵ2){\cal O}(1/\epsilon^2) rate of convergence, in terms of both optimality gap and constraint violation, when the objective and constraint functions are generally convex, where ϵ\epsilon denotes the optimality gap and infeasibility. Moreover, we show that this rate of convergence can be improved to O(1/ϵ){\cal O}(1/\epsilon) if the objective and constraint functions are strongly convex. We then present a variant of CSA, namely the cooperative stochastic parameter approximation (CSPA) algorithm, to deal with the situation when the constraint is defined over problem parameters and show that it exhibits similar optimal rate of convergence to CSA. It is worth noting that CSA and CSPA are primal methods which do not require the iterations on the dual space and/or the estimation on the size of the dual variables. To the best of our knowledge, this is the first time that such optimal SA methods for solving functional or expectation constrained stochastic optimization are presented in the literature

    Dynamic Stochastic Approximation for Multi-stage Stochastic Optimization

    Full text link
    In this paper, we consider multi-stage stochastic optimization problems with convex objectives and conic constraints at each stage. We present a new stochastic first-order method, namely the dynamic stochastic approximation (DSA) algorithm, for solving these types of stochastic optimization problems. We show that DSA can achieve an optimal O(1/ϵ4){\cal O}(1/\epsilon^4) rate of convergence in terms of the total number of required scenarios when applied to a three-stage stochastic optimization problem. We further show that this rate of convergence can be improved to O(1/ϵ2){\cal O}(1/\epsilon^2) when the objective function is strongly convex. We also discuss variants of DSA for solving more general multi-stage stochastic optimization problems with the number of stages T>3T > 3. The developed DSA algorithms only need to go through the scenario tree once in order to compute an ϵ\epsilon-solution of the multi-stage stochastic optimization problem. As a result, the memory required by DSA only grows linearly with respect to the number of stages. To the best of our knowledge, this is the first time that stochastic approximation type methods are generalized for multi-stage stochastic optimization with T3T \ge 3

    Randomized First-Order Methods for Saddle Point Optimization

    Full text link
    In this paper, we present novel randomized algorithms for solving saddle point problems whose dual feasible region is given by the direct product of many convex sets. Our algorithms can achieve an O(1/N){\cal O}(1/N) and O(1/N2){\cal O}(1/N^2) rate of convergence, respectively, for general bilinear saddle point and smooth bilinear saddle point problems based on a new prima-dual termination criterion, and each iteration of these algorithms needs to solve only one randomly selected dual subproblem. Moreover, these algorithms do not require strongly convex assumptions on the objective function and/or the incorporation of a strongly convex perturbation term. They do not necessarily require the primal or dual feasible regions to be bounded or the estimation of the distance from the initial point to the set of optimal solutions to be available either. We show that when applied to linearly constrained problems, RPDs are equivalent to certain randomized variants of the alternating direction method of multipliers (ADMM), while a direct extension of ADMM does not necessarily converge when the number of blocks exceeds two

    An optimal randomized incremental gradient method

    Full text link
    In this paper, we consider a class of finite-sum convex optimization problems whose objective function is given by the summation of mm (1\ge 1) smooth components together with some other relatively simple terms. We first introduce a deterministic primal-dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal-dual termination criterion. Our major contribution is to develop a randomized primal-dual gradient (RPDG) method, which needs to compute the gradient of only one randomly selected smooth component at each iteration, but can possibly achieve better complexity than PDG in terms of the total number of gradient evaluations. More specifically, we show that the total number of gradient evaluations performed by RPDG can be O(m){\cal O} (\sqrt{m}) times smaller, both in expectation and with high probability, than those performed by deterministic optimal first-order methods under favorable situations. We also show that the complexity of the RPDG method is not improvable by developing a new lower complexity bound for a general class of randomized methods for solving large-scale finite-sum convex optimization problems. Moreover, through the development of PDG and RPDG, we introduce a novel game-theoretic interpretation for these optimal methods for convex optimization
    corecore